40 research outputs found

    Creating Lexical Resources in TEI P5 : a Schema for Multi-purpose Digital Dictionaries

    Get PDF
    Although most of the relevant dictionary productions of the recent past have relied on digital data and methods, there is little consensus on formats and standards. The Institute for Corpus Linguistics and Text Technology (ICLTT) of the Austrian Academy of Sciences has been conducting a number of varied lexicographic projects, both digitising print dictionaries and working on the creation of genuinely digital lexicographic data. This data was designed to serve varying purposes: machine-readability was only one. A second goal was interoperability with digital NLP tools. To achieve this end, a uniform encoding system applicable across all the projects was developed. The paper describes the constraints imposed on the content models of the various elements of the TEI dictionary module and provides arguments in favour of TEI P5 as an encoding system not only being used to represent digitised print dictionaries but also for NLP purposes

    Creating Lexical Resources in TEI P5

    Get PDF
    Although most of the relevant dictionary productions of the recent past have relied on digital data and methods, there is little consensus on formats and standards. The Institute for Corpus Linguistics and Text Technology (ICLTT) of the Austrian Academy of Sciences has been conducting a number of varied lexicographic projects, both digitising print dictionaries and working on the creation of genuinely digital lexicographic data. This data was designed to serve varying purposes: machine-readability was only one. A second goal was interoperability with digital NLP tools. To achieve this end, a uniform encoding system applicable across all the projects was developed. The paper describes the constraints imposed on the content models of the various elements of the TEI dictionary module and provides arguments in favour of TEI P5 as an encoding system not only being used to represent digitised print dictionaries but also for NLP purposes

    Modelling frequency data -- Methodological considerations on the relationship between dictionaries and corpora

    Get PDF
    International audienceThe research questions addressed in our paper stem from a bundle of linguistically focused projects which -among other activities- also create glossaries and dictionaries which are intended to be usable both for human readers and particular NLP applications. The paper will comprise two parts: in the first section, the authors will give a concise overview of the projects and their goals. The second part will concentrate on encoding issues involved in the related dictionary production. Particular focus will be put on the modelling of an encoding scheme for statistical information on lexicographic data gleaned from digital corpora

    Towards Finer Granularity in Metadata

    Get PDF
    In early 2010, the Austrian Academy of Sciences’ ICLTT instituted an experiment in selective metadata creation for a medium-sized collection (<100 million tokens) of digitised periodicals. The project has two main objectives: (a) assigning basic structures to previously digitised texts, so-called divisions in TEI nomenclature, thus creating a set of new digital objects, and (b) the subsequent categorisation of these texts with the purpose of being able to create thematically organised sub-corpora. An additional objective was to have metadata stored as TEI headers. Attempts at streamlining metadata creation are legion, in particular in the library community. Tools to do the job are often incorporated into workflow engines which consist of commercial products (such as docWORKS[e] and C-3) as well as free products such as Goobi, which incorporates the metadata creation tool RusDML, and the Archivists’ Toolkit™. The experimental workflow being tested at the ICLTT is an attempt to capture detailed metadata for a comparatively large collection of digitised periodicals and other collective publications such as yearbooks, readers, commemorative publications, almanacs, and anthologies. While all higher-level digital objects in the corpus were furnished with metadata from the beginning of the digitisation process, the current experiment is designed to enrich this data to more fully describe the contents of the material at hand. To achieve this end, the department’s standard tools were adapted, which had the added benefit of keeping software production costs at a minimum. While in earlier experiments of our group of researchers (metadata creators) created the TEI header for each text division manually, we have been trying to approach the problem by exploiting the contents section of the digitised issues and/or other secondary sources, which has resulted in a tangible acceleration of the process. Together with collecting basic data such as author, title, publication date, and creation date, the project classifies each division with a type of texts and topics, the latter using the standard Dewey Decimal Classification (version 22, German) with supplementary keywords. This paper discusses a number of issues concerning the quality and type of resulting data. It also touches upon the issue of automation and at what points in the process human intervention is indispensible. Particular attention is directed at the software module for creating TEI headers

    Shaping Translation: A View from Terminology Research

    Get PDF
    This article discusses translation-oriented terminology over a time frame that is more or less congruent with META’s life span. Against the backdrop of the place of terminology in shaping professional issues in translation, we initially describe some stages in the process by which terminology has acquired institutional identity in translator training programmes and constituted its knowledge base. We then suggest a framework that seeks to show how theory construction in terminology has contributed to a better understanding of technical texts and their translation. A final section similarly illustrates how this overarching theoretical scheme has driven, or is at least consistent with, products and methods in the translation sector of the so-called language industries.Cet article aborde la terminologie dans l’optique de la traduction (profession, pratique, théorie) durant les cinquante dernières années - période correspondant à la vie de META. Après avoir esquissé ce que le profil contemporain du traducteur doit à la terminologie, l’article examine à tour de rôle: (a) les étapes dans la constitution de cette science des termes, (b) comment cette science a acquis droit de cité dans les programmes de formation des traducteurs, (c) le cadre explicatif contemporain qu’elle propose pour rendre compte des textes techniques et de leur traduction, (d) les retombées de ce cadre pour les secteurs des industries de la langue qui se justifient largement par rapport à la traduction

    The European Language Resources and Technologies Forum: Shaping the Future of the Multilingual Digital Europe

    Get PDF
    Proceedings of the 1st FLaReNet Forum on the European Language Resources and Technologies, held in Vienna, at the Austrian Academy of Science, on 12-13 February 2009
    corecore